Pix2struct Infographics Vqa Large
Apache-2.0
Pix2Struct is an image encoder-text decoder model trained through multi-task learning for visual-language understanding tasks, specifically optimized for visual question answering on high-resolution infographics.
Image-to-Text
Transformers Supports Multiple Languages